

## CLAIMS

1. A method to control and monitor a hybrid cooling system for cooling multiple logic modules with different heat loads to the same temperature while maintaining system clock speeds as fast as viable, the method comprising:

cooling the multiple logic modules with a single refrigerant unit having a backup air cooling system;

monitoring temperatures of any logic module subject to temperature changes;

controlling a first PID loop of electronic expansion valves in fluid communication with a corresponding evaporator, each expansion valve controlling the temperature of a corresponding logic module operating, each logic module having a heat load cooled by at least one of the single refrigerant unit and the backup air cooling system; and

controlling a second PID loop of a compressor speed of the single refrigerant unit to extend refrigeration capacity and control for cooling multiple logic modules once an expansion valve has maximized a cooling capacity that the expansion valve can deliver.

2. The method of claim 1 further comprising:

controlling a blower speed for the airflow that cools a refrigerant condenser of the single refrigerant unit if a thermal sensor recording a tube temperature entering the condenser is different than a temperature of air exiting the condenser indicative of at least one of an improper and an unstable cooling condition.

3. The method of claim 2, wherein the blower speed is increased when the tube temperature is cooler than the temperature of air exiting the condenser.

4. The method of claim 1 further comprising:

monitoring the cooling state of each logic module of the multiple modules along with error registers to denote any cooling hardware failures of the single refrigerant unit not yet repaired.

5. The method of claim 4, wherein the monitoring is done through redundant thermal sensors directly monitoring a region representative of circuit temperatures of a corresponding logic module.

6. The method of claim 5, wherein the region corresponds with one of a hat, substrate, and individual chips of a multi chip module (MCM).

7. The method of claim 5, wherein the thermal sensors are compared for at least one of miscompare properties and insanity limits to check accuracy of each measured temperature.

8. The method of claim 5, wherein the thermal sensors include a first thermal sensor sensed by the refrigerant unit and second and third thermal sensors read by a power supply supplying power to the multiple logic modules to insure at least one of full redundancy and accuracy.

9. The method of claim 8, wherein the second and third thermal sensors are compared to each other and to the first thermal sensor versus miscompare limits, the second and third thermal sensors providing thermal protection of the multiple logic modules by dropping power if at least one of second and third thermal sensors indicate a temperature corresponding to a damage limit.

10. The method of claim 1, wherein the first PID loop control opens an electronically controlled expansion valve when a corresponding logic module is higher than targeted and closes the valve when cooler than targeted.

11. The method of claim 1, wherein the refrigerant unit includes a condenser, the electronically controlled expansion valves, the compressor, and a controller providing control signals to the expansion valves all contained within a modular refrigeration unit, and each expansion valve is proximate a corresponding evaporator in thermal communication with a respective logic module.

12. The method of claim 11, wherein the hybrid cooling system includes liquid cooling provided by the refrigerant system and the backup air cooling provided by heat sink fins in thermal communication with each corresponding evaporator.

13. A method to determine a proper clock cycle time for multiple logic modules with different heat loads while maintaining the clock cycle time as fast as viable, the method comprising:

determining a thermal state of each logic module of the multiple logic modules, each thermal state defined by a discrete temperature range associated with a clock speed predetermined to be a proper clock cycle time for the temperature range; and

determining whether a primary cooling means has been repaired.

14. The method of claim 13 further comprising:

turning on backup air cooling fan if a temperature of any of the multiple logic modules are above acceptable levels of cooling by the primary cooling means.

15. The method of claim 14 further comprising

controlling a fan speed of the backup cooling fan to prevent oscillation between thermal states.

16. The method of claim 13 further comprising:

increasing a voltage applied to a logic module to optimally use at least one of available cooling and power when operating in a backup cooling mode for maximum clock speed at a given temperature.

17. The method of claim 13 further comprising:

decreasing a voltage applied to a logic module when operating in a backup cooling mode and at least one of cooling or power is unavailable to reduce leakage currents that warmer degraded temperatures generate.

18. The method of claim 13, wherein said determining a thermal state of each logic module of the multiple logic modules is done through redundant thermal sensors directly monitoring a region representative of circuit temperatures of a corresponding logic module to provide at least one of thermal protection and redundancy to guide cooling control.

19. The method of claim 18, wherein the region corresponds with one of a hat, substrate, and individual chips of a multi chip module (MCM).

20. The method of claim 18, wherein the thermal sensors are compared for at least one of miscompare properties and insanity limits to check accuracy of each measured temperature.

21. The method of claim 20, wherein the thermal sensors include a first thermal sensor sensed by the refrigerant unit and second and third thermal sensors read by a power supply supplying power to the multiple logic modules to insure at least one of full redundancy and accuracy.

22. The method of claim 21, wherein the second and third thermal sensors are compared to each other and to the first thermal sensor, the second and third thermal sensors providing thermal protection of the multiple logic modules by dropping power if at least one of second and third thermal sensors indicate a temperature corresponding to a damage limit.

23. The method of claim 14 further comprising:

operating the backup air cooling fans in a manner to insure that the fans always turn on even if the primary cooling means has failed; and

operating the backup air cooling fans in a manner to insure that the fans do not come on so soon as to cause an oscillation of a cooling state when the primary cooling means has failed.

24. A method to initialize the logic clocks for multiple logic modules in a fail-safe parallel manner, the method comprising:

cooling the multiple logic modules with a hybrid cooling system, the hybrid cooling system includes a refrigerant unit as a primary cooling means and backup air cooling as a secondary cooling means; and

issuing parallel “pre-cooling” commands to each logic module cooled by the refrigerant unit that allows the primary cooling means a head start in cooling prior to the logic clocks being turned on.

25. The method of claim 24 further comprising:

signaling that a pre-cooling temperature state has been reached; and

initiating initial microcode load (IML).

26. The method of claim 25 further comprising:

resetting an integral term in a PID controller when the logic clocks are sensed to have first come on to improve the refrigerant unit cooling accuracy.

27. The method of claim 26 further comprising:

setting and verifying current phase lock loop patterns before any change is attempted with respect to the logic clocks.

28. The method of claim 27 further comprising:

signaling any change in a cooling state of any of the multiple logic modules with interrupts.

29. The method of claim 28 further comprising:

reviewing the cooling state of each logic module of the multiple logic modules running in a given server; and

adjusting the clock speed target based on a logic module having the most thermal degradation.

30. The method of claim 29 further comprising:

incrementing the clock speed using a two step method for phase lock loops (PLL) always remembering and verifying a current setting before initiating a new setting.

31. The method of claim 30 further comprising:

dividing an entire temperature operating range of the multiple logic modules from a normal operating temperature to a near hardware damage temperature at which the logic modules are powered off forming continuous, programmable regions known as “cooling states, wherein within each cooling state exists a single “optimized” clock speed that is maintained while in a given cooling state.

32. The method of claim 31, wherein each temperature range defining a corresponding cooling state has hysteresis built in to prevent oscillation between clock speeds.

33. The method of claim 30, wherein the incrementing includes incrementing several clocks effecting one processor system to move in a means that an optimum ratio of clock speeds is always maintained,

34. The method of claim 33, wherein a minimum deviation from a known "ideal clock ratio" is always maintained between any two clocks.

35. The method of claim 34 further comprising:

using a maximum increment smaller than what could cause the PLLs to lose lock due to noise to at least one of an oscillator and to a chip having the PLL.

36. The method of claim 35, wherein a total clock increment between two cooling states is achieved in a stepwise series of "pseudo-linear" increments of clocks.

37. The method of claim 35 further comprising:

changing the clocks of multiple clock boundaries with different oscillators and different frequencies always maintaining ideal clock ratios and minimum increments when suitable for best performance in a new temperature environment.

38. The method of claim 31 further comprising:

verifying at each system IML that the phase lock loops are capable of operating at every possible clock speed inside the range of operation.

39. The method of claim 38 further comprising:

using product cooling hardware including one of refrigerant, water, and air cooling to temperature bias stress test the logic modules prior to shipping.

40. The method of claim 35 wherein if a problem with the cooling system is present at system power on and the IML initialization stage, the IML happens in a degraded cooling state.

41. The method of claim 40, wherein after the cooling problem is repaired, the server does not speed up to its fastest clock speed to limit risk to an elastic interface.

42. The method of claim 30, further comprising:

storing and updating clock data in a in EEPROM for later use in clock adjustments.

43. The method of claim 40 further comprising:

alerting an operator when the system is operating in a given degraded cooling range; and

notifying the operator not to re-IML until the cooling problem is serviced.

44. The method of claim 42, further comprising:

executing a repair and verify procedure configured to automatically remove error registers when the cooling problem has been repaired; and

allowing the server to increase its clock speed until it reaches a clock speed at which it was IMLed.